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OPTIMAL SCALINGS FOR LOCAL METROPOLIS HASTINGS 
CHAINS ON NONPRODUCT TARGETS IN HIGH DIMENSIONS^ 

By Alexandros Beskos, Gareth Roberts and Andrew Stuart 

University of Warwick 

We investigate local MCMC algorithms, namely the random-walk 
Metropolis and the Langevin algorithms, and identify the optimal 
choice of the local step-size as a function of the dimension n of the 
state space, asymptotically as n ^ oo. We consider target distribu- 
tions defined as a change of measure from a product law. Such struc- 
tures arise, for instance, in inverse problems or Bayesian contexts 
when a product prior is combined with the likelihood. We state an- 
alytical results on the asymptotic behavior of the algorithms under 
general conditions on the change of measure. Our theory is moti- 
vated by applications on conditioned diffusion processes and inverse 
problems related to the 2D Navier-Stokes equation. 

1. Introduction. The Markov chain Monte Carlo (MCMC) methodology 
provides a flexible approach for simulating high-dimensional distributions 
appearing, for instance, as posterior information in a Bayesian context [15] or 
as integrators in importance sampling methods [14]. This paper is concerned 
with local MCMC algorithms, namely the random-walk Metropolis (RWM) 
and the Langevin algorithms. Our objective is to investigate the optimal 
choice of the local step-size as a function of the dimension n of the state 
space, asymptotically as n — > oo. In particular, we examine if the step-size 
should diminish with n and, if so, at what rate. 

The results in this paper extend significantly those in [21] and [22] since we 
here consider a family of nonproduct target laws. Our theory covers prac- 
tical, involved probabilistic models: we will consider conditioned diffusion 
processes and inverse problems related to the 2D Navier-Stokes equation. 
In both these cases, the target measure is a change of measure from a Gaus- 
sian law on some appropriate infinite-dimensional Hilbert space. Gaussian 
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laws on abstract Hilbert spaces correspond to product Gaussian densities 
via the Karhunen-Loeve expansion. This motivates the setting in which we 
present our results: target distributions defined as change of measure from 
a product law. 

Consider the target probability density 7r„ : M" ^ M defined w.r.t. the 
Lebesque measure dx. The Metropolis-Hastings theory defines a Markov 
chain reversible w.r.t. vr^. Under the assumption of ergodicity, a path of the 
chain will provide, once convergence to stationarity is achieved, correlated 
draws from 7r„. The dynamics of the chain are developed as follows. Given 
the current location x E M" a move to y is proposed according to some user- 
specified transition kernel qn{x,dy) =qn{x,y)dy. Reversibility w.r.t. 7r„ is 
then guaranteed [18] if the chain moves from x — > x' according to the rule: 

^/ ^ f y, with probability an{x, y), 
\ x, otherwise, 

where the acceptance probability is given by 

n 9^ „c^,A_JlA — -- — -, if 7r„(x g„ x,y >0, 

an[x,y)-< TTn{x)qn{x,y) 

[O, if 7rn{x)qn{x,y) = 0. 

Ergodicity holds under regularity conditions on 7r„, g„; see, for instance, 
[15]. 

In this paper, 7r„ will be a change of measure from a reference product 
law, denoted by 7f„. In particular, the family of targets is specified as follows: 



(1.3a) ^^{x) =exp{-(j)n{x)} 

d-Kn 

for some ;S(M"')-measurable mapping with vr^ having Lebesque density: 

" 1 / \ " 1 
(1.3b) (:,) = [] / = [] exp{-g(x./A.)} 



\'j J ■ -t Ay 

for appropriate /, ^:Mi-^]R. Motivated by applications, the standard devia- 
tions are assumed to satisfy 

(1.3c) Ai=r'', i = l,2,..., 

for some n > 0.^ We investigate MCMC algorithms corresponding to the 
following local-move proposals: 

(1.4) RWM: y = X + cj„Z, 

2 

(1.5) SLA: y = x + ^Vlog7r„(x)+c7„Z 



^We later relax this assumption to allow for algebraic decay at rate i see (2.6). 
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with Z ~ AA(0,/„) and step-size ct„ > 0. The first corresponds to a random- 
walk update; the second to a simphfied version of the standard Metropolis- 
adjusted Langevin algorithm (MALA) with proposal: 



We will show that using only the reference law vf^ in SLA (which stands 
for "simplified Langevin algorithm") does not reduce the asymptotic prop- 
erties of the resulting algorithm. The objective is the identification of the 
"appropriate" scaling of function of the dimension n, for large n. 

Excessively large or small an will typically result in unacceptably small or 
high acceptance probabilities, respectively, and to poor ergodicity and mix- 
ing properties for the MCMC. 

References [21] and [22] analyze RWM and MALA as applied to i.i.d. tar- 
gets corresponding to our 7r„ with k = 0. They show that step-sizes should be 
arranged as al = 0{n~^) for RWM and = 0{n~^/^) for MALA. Exten- 
sions of such results have thereafter appeared in several papers; see [1, 2] and 
[20] for product targets. For nonproduct targets, [8] examines an exchange- 
able target, [9] a local interaction model related with Gibbs distributions 
and [24] elliptically symmetric targets. See [3] for an analytical review. Us- 
ing a different approach, we look at the structure of the nonproduct target 
TTn in (1.3) and present general, analytical conditions on <pfi under wliich. we 
show that one should select = 0{n-'^'^~'^) for RWM and al = 0(n"2«-V3) 
for SLA. We will use the average squared-jump-distance as an index of the 
efficiency of MCMC algorithms, as it allows for transparent, explicit cal- 
culations. Our analysis is considerably simpler than the approach adopted 
in the preceding papers and, as we will show, the results are relevant for 
probability measures arising in practical applications. 

A motivation for investigating the change of measure (1.3) are cases when 
the densities TTn, vTn under consideration are finite-dimensional approxima- 
tions of some in/inite-dimensional measures vr, vr related through the density 



on an appropriate space. In this paper we substantiate the intuition that the 
presence of absolute continuity in the limit n — > oo implies similar asymptotic 
behavior for the MCMC between the product and nonproduct scenarios. 

The structure of the paper is as follows. In Section 2 we state and prove 
the scaling results for RWM and SLA. In Sections 3, 4 and 5 we present 
applications of the theory in the context of conditioned diffusion processes 
and Navier-Stokes inverse problems. In Section 6 we show some additional 
ideas and algorithms which can deliver improved results. We finish with some 
conclusions in Section 7. Proofs are collected in an Appendix B; before that, 
we have collected some Taylor expansion results needed in the proofs in 
Appendix A. 



(1.6) 




(1.7) 



— (X) = exp{-</.(X)} 
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(2.3) 



2. The main results. We will state rigorously the scaling results outlined 
in the Introduction. For both RWM and SLA, as applied to our target vr^ 
in (1.3), the acceptance probability in (1.2) can be written as 

(2.1) a„(x,y) = 1 Aexp{(/)„(a::) - cpniv) + Rn{x,y)} 

for an exponent Rn = Rnix,y), which in the case of RWM is equal to 

n 

(2.2) i2™ = J2{gix,/Xi) - g{m/X^)}, 

1=1 

whereas for SLA: 

n 
i=l 

'm _ Xi \ g'jyi/Xj) +g'{xi/\i) 
XiJ 2 

-^{{g'{m/X.)f-{g'{x,/x,)f). 

In the case of R^^^, y and x are connected via (1.4), in the case of Rf^^ 
via (1.5). We impose the following regularity conditions on g and the density 
/ = exp{-9}. 

Condition 1. (i) The function g is infinitely differentiable with deriva- 
tives of all orders having a polynomial growth bound, 
(ii) All moments of / are finite. 

We henceforth assume that Condition 1 holds without further reference. 
We will explore the behavior of MCMC algorithms in equilibrium; we note 
that algorithmic properties could be different in transient regimes; see [10]. 
Thus, most expectations are considered in stationarity and E[-] will in gen- 
eral denote expectation under the relevant target vr^; we will sometimes 
write -Ett^ [•] or E'^^ [•] when we need to be explicit about the measure under 
consideration. 

Broadly speaking, our approach will be to identify the limiting properties 
of the exponent (pnix) — 4>niy) + Rn as n ^ oo. Through this term we will 
then be able to derive the asymptotic properties of the average acceptance 
probability E.,t^[anix,y)] and obtain analytical results for the squared-jump- 
distance over the various choices of the step-size an- We will work with 
Lq-limits and convergence in law, and we will obtain results under fairly 
general assumptions, suited to complex (nonproduct) probabilistic models 
arising in applications. In contrast, the analysis so far in the literature has 
been somewhat technical, focusing most on a rich array of results available 
in the (restrictive) product set-up. 
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2.1. The product case. To connect with the hterature, we wih initially 
state a result for the simphfied product case when (pn = for all n > 1. 
Several of the ideas in the proof will be relevant in the nonproduct scenario 
later. All stated expectations are in stationarity (here x ~ 7f„); $ is the CDF 
of AA(0, 1). To avoid repetition when stating results for RWM and SLA we 
define the following index /: 

RWM: / = 1, 

(2.4) 

SLA: / = 1/3. 



Theorem 1. Let itn in (1.3) he the target distribution for RWM and 
SLA. Assume that a'^ = l'^n~P for l,p> 0. Then, as n — > oo; 

(i) if p = 2k + I , for I as in (2.4), then E[an{x , y)] ^ a{l) , where 

a''WM{0 = 24.(-ly^), «^"(0 = 2*(4\/|^), 

for constants K^WM^ ^SLA > ^-^j^ ^rwm = i + 2k and r^^^ = 1 + 6k, 

(ii) if p > 2tv + I , then E[an{x, y)] 1, 

(iii) if2K<p<2K + I, then nPE[an{x,y)] ^ for any p>0. 
The constants /f^WM j^ShA 

are given in the proof and depend only 
on the density / appearing at the definition of TTn- The results are based on 
the limiting behavior of Rn- When p = 2k + I , Rn is subject to a central limit 
theorem, forcing a limit also for the average acceptance probability. Other 
step-sizes will n-eventually lead to a degenerate acceptance probability. We 
will see in the sequel that a quantity providing an index for the efficiency of 
the MCMC algorithms considered here, is the product 

alE[anix,y)]. 

Clearly, in the context of Theorem 1, this quantity is maximized (in terms 
of the dimension n) for p = 2k + /; it is larger for SLA than for RWM due 
to the presence of information about the target in the proposal. 

2.2. The general case. The above results describe the behavior of the 
expectation £^^f^ [1 A exp{i?„}] for large n. One could now look for conditions 
on the change of measure (1.3a) that allow some of these results to apply to 
the more interesting nonproduct scenario. The quantity under investigation 
is now E-ir^ll Aexp{(/)„(x) — (pniu) + Rn}], an expectation w.r.t. TTn- We will 
first consider the following condition: 
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Condition 2. There exists M > such that for any sufficiently large n 
\(t>n{x)\<M forallxGM". 

Of course, it would suffice that the condition holds 7r„-a.s.; in the sequel, 
conditions on (pji should, be interprctGd as conditions on some TTy^-vGrsion of 

(l)n = -logidTTn/dTTn). 

Theorem 2. Let in (1-3) he the target distribution for RWM and 
SLA. Assume that = Pn~P for l,p> 0. If satisfies Condition 2, 

then as n — > oo ; 

(i) if p > "^K + I , then liminf„_>oo E[anix, y)] > 0, 

(ii) if 2k < p < 2k + I , then n*'£'[a„(x, y)] for any p>0, 

for the appropriate index I for each of RWM and SLA specified in (2.4)- 

Condition 2 provides a recipe for a direct transfer of some of the results of 
the product case to the nonproduct one. A more involved result, motivated 
by the collection of applications we describe in the following sections, is 
given in the next theorem. For any s G M, we define the norm | • |s on M" as 
follows: 

/ n \ 1/2 

(2.5) \x\s=(Y.^'^1) ■ 

We also set | • | = | • l^. 

Condition 3. There exists M e E such that, for all sufficiently large n, 
(j)nix)>M forallxGM". 

Condition 4. There exist constants C > 0, p > and s < k-1/2, s' < 
K — 1/2 such that, for all sufficiently large n, 

\My) - Mx)\ < C{1 + \x\P + \y- x\P)\y - x\,, 

for all x,y£ M". 

Condition 4 is motivated by the application to conditioned diffusions in 
one of the following sections; we shall see that, in some contexts, it follows 
from a polynomial growth assumption on the derivative of (j)n- Condition 3 
prevents the nonproduct measure 7r„ from charging sets with much higher 
probability than 7r„. The following condition (clearly weaker than Condition 
4) will be relevant in the Navier-Stokes problem. We set R"^ = [0,oo). 
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Condition 5. There exist locally bounded function 5:M+ x M+ M+ 
and constants C > 0, p > 0, and s,s',s" all three (strictly) smaller than 
K — 1/2, such that, for all sufficiently large n: 

(i) l<^n(y) - <Pn{x)\ < S{\x\s, \y\s)\y - x\s>, 

(ii) |(^„(x)| <C(l + |x|^„) for all x,yeM". 

Theorem 3. Let in (1-3) he the target distribution for RWM and 
SLA. Assume that o"^ = Pn~P for l,p> 0. If {</>«} satisfies Conditions 3 
and 5, then as —> oo ; 

(i) if p = 2k + I , then E[an{x,y)] -^a{l), for a{l) defined in Theorem 1, 

(ii) if p > 2k + I , then E[an{x,y)] ^ 1, 

(iii) if 2k < p < 2k. + I , then nPE[an{x,y)] ^ for any p >0. 
Remark 1. Condition 5 implies the probabilistic statement 

(Pniy) - (t>n{x) 

for arbitrarily large q. This result, together with Condition 3, suffices for 
showing that the effect of the change of measure in the scaling properties of 
RWM and SLA is asymptotically negligible. 

2.3. Optimality. We use the squared-jump-distance as an index for the 
efficiency of different MCMC algorithms. Specifically, for the algorithms we 
have considered so far, we will calculate the quantity 

Sn.= E[{x'i,-X,*f] 

for some arbitrary fixed z*; here x' is the location of the MCMC Markov 
chain after one step given that currently the chain is at x ~ 7r„, and i* refers 
to a fixed element of {1, 2, . . . , n}. Note that 

Corr„ {xi* , x ) = 1 - , 
2 Var„ 

Corr„, denoting correlation and Var„ the variance of Xi* in stationarity. Thus, 
larger Sn implies lower first-order autocorrelation. 

Theorem 4. Let nn in (1.3) be the target distribution for RWM and 
SLA. Assume that a"^ = Pn~P for l,p> 0. If {(pn} satisfies Conditions 3 
and 5, then as n — > oo .■ 

(i) if p = 2k + I , then 

5„ = fa{l) X n-'^'^-^ + o(n-2'^-^) 

for a{l) defined in Theorem 1, 
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(ii) if p>2k + I, then Sn = l'^n~P + o{n~P), 

(iii) if 2k, < p < 2k. + I , then Sn = 0{n~'P) for any p>0. 

Theorem 4 shows how to scale the proposal step, as a function of n, to 
maximize the mean squared-jump-distance; we can then tune the parameter 
/. When maximizing the coefficient Pa{l) over / we retrieve a familiar result 
characterizing RWM and the Langevin algorithms: in the case of RWM, 
t^a{l) is maximized for that / for which aP^'^{l) = 0.234, and in the case 
of SLA for a^^^{l) = 0.574, for any choice of the reference density / and 
any change of measure satisfying Conditions 3 and 5. These characteristic 
numbers were first obtained in [21] and [22], in the simplified context of i.i.d. 
target distributions. 

2.4. A generalization. It is straightforward to extend the results so far 
stated to a larger family of target distributions. Such a generalization will 
be required in the applications considered in the sequel. 

Corollary 1. Allow the target distribution 7r„ to he as in (1-3) hut with 
parameters Aj that may also depend on n, so Aj = Xi^n- Assume that there 
exist constants C_, (7+ > and k > such that, for all n and 1 <i <n, 

(2.6) C-i-^ <Xi,n<C+i-''. 

Then all statements of Theorems 1-4 apply directly to the target tt^ with Aj = 
Xi^n except for those where the limiting prohabilities a^^^{l) and a^^^{l) 
appear. For the latter cases, the statements will hold after replacing the stated 
values for 7-^^^ and t^^^ with 

n n 

r™ = lim y ^SLA ^ ^-^ ^-(6«+i) y 

provided that the ahove limits exist. 

3. Applications in infinite dimensions. In this section we introduce a 
range of applications which require the sampling of a probability measure 
on function space. The common mathematical structure of these problems 
is that the measure of interest, vr, has density with respect to a Gaussian 
reference measure 7r~AA(m,C): 

(3.1) ^(x)ocexp{-,/.(X)}. 

The Karhunen-Loeve expansion for Gaussian measures on a Hilbert space 
allows us to view the target vr as a change of measure from a product of 
scalar Gaussian densities on M°°, thus casting the simulation problem into 
the theory presented in the first part of the paper. 
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We highlight two types of problem where the structure (3.1) arises nat- 
urally. The first type concerns SDEs conditioned on observations. Here the 
Gaussian reference measure is typically a conditioned Gaussian diffusion, in 
which nonlinear drifts are ignored, and the posterior measure is found via 
application of the Girsanov formula. The second type of problem concerns 
inverse problems for differential equations, where prior knowledge about an 
unknown function, in the form of a Gaussian measure, is combined with 
observations, via application of the Bayes formula, to determine a posterior 
measure on function space. The common structure inherent in these prob- 
lems allows for the use of the same notation in the different contexts: the 
mean of the reference Gaussian measure will be m and its covariance oper- 
ator will be C; the precision operator —C~^ will be denoted by £. The state 
space will be a separable Hilbert space TC. 

The Gaussian law is well defined if and only if C : W i— > W is a positive, 
self-adjoint and trace-class operator, the last property meaning that its 
eigenvalues are summable. Thus, we may construct an orthonormal basis 
of eigenfunctions {ei}°^i and corresponding eigenvalues {Xf}'?^^ satisfying 
Cei = Xfci with J2i '^i < oo. A typical draw X ~ Af{m,C) can be written via 
the Karhunen-Loeve expansion as 

oo 

(3.2) X = ^Xiei, Xi = mi + Xiii, 

4 = 1 

where the sequence {Ciji^i is i.i.d. with ~AA(0, 1) and mi = {m,ei). It is 
readily verified that 

oo 

E[{X -m)^iX -m)]=Y, Af (e^ a) = C, 

i=l 

which agrees (here (8) stands for tensor product, and expectation is over a 
random linear operator; see [12] for more details on theory of expectations on 
general Hilbert spaces) with the familiar identity for the covariance matrix 
on Euclidean spaces. From (3.2), the isomorphism X {xi} allows us to 
view M{m,C) as a product measure on £2, the space of square summable 
sequences. 

A Gaussian measure on a Hilbert space is thus easy to build, given an 
orthonormal basis for the space, simply by specifying the eigenvalues of the 
covariance operator. Such an approach provides also a natural way to specify 
regularity properties of Hilbert space valued random functions. To illustrate 
this, we define the norm || • ||s on 7i (which by the duality X {xi} can 
also be seen as a norm in £2) as follows: 

Coo \ 1/2 
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for s G M. The largest s for which ||^||s is finite is a measure of the regularity 
of X as it encodes information about the rate of decay of the coefficients Xi . 
From the Karhunen-Loeve expansion we deduce that 

oo 

E\\X-m\\l = J:^'^Xl 

i=l 

Thus, the Gaussian measure delivers realizations of finite || • ||<j-norm if Xf is 
chosen to decay like j-(2s+i+e) for some e > 0. Depending on the particular 
space under consideration, the definition of the || • \\s can slightly vary from 
the one in (3.3); we will give explicit definitions for each of the applications 
in the sequel. In many cases there is a natural relationship between the 
spaces of finite || • ||s-norm and various function spaces which arise naturally 
in the theory of partial differential equations. For example, in one dimension 
with function ei{t) = \/2sin(z7rt) one obtains Sobolev spaces (see, e.g., [23]). 



Remark 2. Finite-dimensional densities 7r„, 7r„, corresponding to ap- 
proximations of vr, vr, can be derived by spectral methods, finite difference 
methods or other approaches. We present such directions in the context of 
the example applications that follow. We show that under general conditions 
on the concrete functional (j) under investigation, its discretized counterpart 
<j)n will satisfy some of the conditions introduced in Section 2. Thus, we can 
use the results from that section to optimize MCMC methods applied to the 
complicated sampling problems presented in the next two sections. 

4. Diffusion bridges. There is a variety of applications where it is of 
interest to study SDEs conditioned to connect two points in space-time: the 
so-called diffusion bridges. We limit our study to problems with additive 
noise satisfying the equation 

dX , , ^, dW 

X{0) = x-, X{T)=x+. 

Here /i : M"^ i— > M'^ and 7 € M'^^'^; we define F = 77-^ and assume that F is in- 
vertible. We denote by n the d-dimensional Brownian bridge measure arising 
when h = and by vr the general non-Gaussian measure. Under mild condi- 
tions on h (see, e.g., [13]), the two measures are absolutely continuous w.r.t. 
each other. Their Radon-Nikodym derivative is given by Girsanov's theorem 
and assumes the general form (3.1) where now 

<I^{X) = l^l\h{X)\l dt - {h{X),dX)r^ 
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with (•,-)r = ("jr""^-), the latter being the standard inner product on W^. 
In this context 7i = L2{[0,T],M.'^). The expression for (f> above involves a 
stochastic integral; it can be replaced with a Riemann integral in the partic- 
ular scenario when T~^h is a gradient. In this context we will consider the 
bridge diffusion: 

dX [2dW 

(4.1) 

X{0)=x-, X{T) = x+ 

with inverse temperature /? > and potential 1/ rR'^ M. Applying Ito's 
formula and ignoring constants, we get that 

(4.2) cP{X)= C G{X)dt, 

where 







G{u) = ^\VV{u)\'^ -]^/W{u), uGM'^. 

A typical application from molecular dynamics is illustrated in Figure 1. 
The figure shows a crystal lattice of atoms in two dimensions, with an atom 
removed from one site. The potential V here is a sum of pairwise potentials 
between atoms which has an r~^^ repulsive singularity. The lattice should be 
viewed as spatially extended to the whole of 1? by periodicity. The removal 
of an atom creates a vacancy which, under thermal activation as in (4.1), will 
diffuse around the lattice: the vacancy moves lattice sites whenever one of the 
neighboring atoms moves into the current vacancy position. This motion of 
the atoms is a rare event, and we can condition our model on its occurrence. 
This application, as well as others involving conditioned diffusions arising in 
signal processing, are detailed in [6]. 

4.1. The Fourier expansion. We focus on model (4.1). The mean of the 
reference Brownian bridge Gaussian measure is m[t) = x~{l — t/T) + x^t/T. 
Without loss of generality we will assume that x~ = x~^ = 0, therefore m = 
0; otherwise, one should work with X — m. The precision operator of the 
Brownian bridge is the Laplacian (see [16]): 

r = ^^ 
2dt2 

with Dirichlet boundary conditions on [0,r], specified through the domain 
of C In the scalar case d=l, the (orthonormal) eigenfunctions {ej} and 
eigenvalues {A?} of the covariance operator C = —C~^ are 

|sin(i7rt/r), = |~2 



(4.3) ei = ei{t) = \ —sin{iTTt/T), \. = ——i- 
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Fig. 1. Crystal lattice with vacancy. We condition on the red atom moving into the 
vacancy. 

with z > 1. For general dimension d, the covariance operator is simply the 
diagonal operator matrix with copies of the covariance operator for d = 1 in 
the diagonal. In this context we will use the expansion 



i=l 

A finite-dimensional density approximating exp{—(j){X)} can be obtained 
via spectral truncation. For n = d x N and the vector in 



oo 



(4.4) 








N 



(4.6a) 



i=l 



By the Karhunen-Loeve expansion, the reference measure will be 



(4.6b) 




for the Aj's in (4.3). 



4.2. The results. For s G M, we define the norm: 

/ oo \ 1/2 




We need the following condition: 



^Note that Pn{x) is the L2-projection of X onto the hnear span of {ei,e2, . . . ,ejv}. 
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Condition 6. G is bounded from below and there exist constants C > 0, 
p> for which 

\G{w)-G{v)\ <C{l + \v\P + \w-v\P)\w-v\ foT al\v,weM.'^. 

Note that the stated condition on \G{w) — G{v)\ is imphed by a poly- 
nomial growth assumption on the gradient of G. Condition 6 will imply 
Conditions 3 and 4 making possible the application of the results of the 
previous section for the target 7r„, in (4.6). To see this, for X ,Y & 7i we use 
Condition 6 to get 

\cP{Y) - 0(X)| < G /^(l + \X{t)\P + \Yit) - Xit)\n\Y{t) - X{t)\ dt 

<G'il + \\X\\l^^ + \\Y-X\\l^)\\Y-XU,, 

where for the second calculation we used Cauchy-Schwarz and the triangle 
inequality for the L2-norm. Now, the Sobolev embedding theorem (stated 
in Lemma B.3 in Appendix B) gives that for any 2 < g < oo one can select 
s = s{q) < 1/2 so that 

II^IIl, <C||^IU 

for all X. Thus, continuing from (4.7): 

(4.8) \ct>{Y) - <A(X)| < C(l + \\X\\P + ||y - XE)\\Y - X\\l, 

for some s = s{p) < 1/2. Equation (4.8) can now be used to deduce that 
4)n{x) = (I){Pn{x)) satisfies Condition 4. 



Proposition 1. If G satisfies Condition 6, then the limiting results of 
Theorems 3 and 4 apply for the case of Tin in (4-6) under the specifications: 

_SLA 0'^' 



1 _RWM P'^ _SLA 

1, T =— — — -, r 



6T2d2 ' 56r6(i6 ■ 

4.3. Finite differences. Similar results also hold for other approximation 
schemes such as finite differences. In particular, assuming that paths are 
approximately constant on subintervals of length At = j^y^ , we get that 

TV 



^ (X) e^pj - E G{Xt)M^ , At : 



N + 1 



for argument X = {Xt)^i, with Xt G M'^. The reference law 7r„, represents 
the finite-dimensional distributions of d independent Brownian bridges at 
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the discrete-time instances At, 2At, . . . , NAt: 

2 At2 



f-2 1 
1 -2 1 



n^ = '[[MiO,-L-^At-^), L 



i=l 



1 -2 1 
V 1 -2/ 

The eigenvectors {cj}^^ of C = —L~^ and its eigenvalues {A?^}^]^ are as 
fohows [in this context = (ej^i, . . . , Ci^N)]'- 



The natural inner product for in this context is {u,v) = J^iLiUiViAt 
under which one can verify that {ei}^^ is an orthonormal basis. We factorize 
TTn via the Karhunen-Loeve expansion. That is, we write 

N 

(4.9) Xt = Y,x.,^ei,t 

i=l 

and work on the space of x = {x'^i,x']'2, ■ ■ ■ ,x'^j^) when the target can be 
rewritten as 

(4.10) gi(x) oc expj - ^ G(X,) At I , vf. = J] | R -^(0' ^'n) | • 



Proposition 2. If G satisfies Condition 6, then the limiting results of 
Theorems 3 and 4 apply for the case of TTn in (4-10) under the specifications: 

= 1 ^RWM ^ SLA ^ 5/3^ 

' T^d^' 2T6d6- 

5. Data assimilation. Data assimilation is concerned with the optimal 
blending of observational data and mathematical model to enhance predic- 
tive capability. It has had major impact in the realm of weather forecasting 
and is increasingly used by oceanographers. As a concrete (and simplified) 
model of these applications from the area of fluid mechanics, we consider 
the problem of determining the initial condition for Navier-Stokes equations 
for the motion of an incompressible Newtonian fluid in a two-dimensional 
unit box, with periodic boundary conditions, given observations. The equa- 
tion, determining the velocity X = X{t,v) over the torus domain T^, can be 
written as follows: 

dX 

— = uAX + B(X,X) + h, t>0, 
dt 

X{0,v)=Xo{v), t = 0. 
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This should be rigorously interpreted as an ordinary differential equation 
in the Hilbert space H, defined as the closure in L2(t2,R2) ^.^e 

space 

of periodic, divergence-free and smooth functions on [0, 1] x [0, 1], with zero 
average (see, e.g., [11] for more details). We specify the operator A in detail 
below, noting here that it is essentially the Laplacian on 7i] the operator B 
is a bilinear form and h a forcing function, but details of these terms will 
not be relevant in what follows. 

As a first example of the type of data encountered in weather forecasting 
we assume that we are given noisy observations of the velocity field at po- 
sitions {vi}f^Y and times {tm\m=i- To be precise, we observe {W/^m} given 
by 

Wi^rn = X{vutm)+il,m^ l = l,...,L, m = l,...,M, 

where the ^^^m's are zero-mean Gaussian random variables. Concatenating 
data we may write 

W = z + C 

for W = {Wi,i,W2,i, . . . ,M^l,m), z = {X{vi,ti),X{v2,h), . . .,X{vL,tM)) and 
^ ~ AA(0, S) for some covariance matrix S capturing the correlations in the 
noise. We refer to this setting as Eulerian data assimilation, the word Eu- 
lerian denoting the fact that the observations are of the Eulerian velocity 
field. 

For a second example, illustrating a data type commonly occurring in 
oceanography, we assume that we are given noisy observations of Lagrangian 
tracers with position z solving 

^ = X{zi,t), zi{0) = zifi, l = l,...,L. 

For simplicity assume that we observe all tracers z at the same set of times 
{tm}m=i and that the zi^^s are known to us: 

Wl^rn = Zlitm) + (,l,m, l = l,...,L, m = 1, . . . , M, 

so that, similarly to the Eulerian case, we may write 

W = z + ^ 

with W = {Wi^i,W2,i,...,Wl,m), z = {zi{ti), Z2{ti), . . . , ziitM)) and noise 
^~AA(0, S) for some covariance matrix S. Figure 2 illustrates the set-up, 
showing a snapshot of the flow field streamlines for X{v, t) and the tracer lo- 
cations zi(t) at some time instance t. We refer to this problem as Lagrangian 
data assimilation, since the observations are of Lagrangian particle trajec- 
tories. 

Note that, for both Eulerian and Lagrangian observations, 5 is a function 
of the initial condition Xq of the Navier-Stokes equation. 
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In applications to weather forecasting, compressible fluid flow models 
would in fact be more appropriate. We have chosen to use the incompress- 
ible Navier-Stokes equation to model the fluid in both examples simply to 
unify the presentation of the Eulerian and Lagrangian data models. A more 
realistic model for weather forecasting would employ a nondissipative model 
for the velocity field, supporting waves, such as the shallow water equations 
[4, 19]. The techniques described in this section could be generalized to such 
models. 

5.1. The Fourier expansion. We now construct the probability measure 
of interest, namely the probability of Xq given Y, for both the Eulerian and 
Lagrangian problems. Any mean-zero X £ L^(T^,C^) can be expanded as a 
Fourier series in the form 



The divergence-free condition is equivalent to setting Xk ■ k = for all k, so 
we can form an orthonormal basis for 7i by letting 



-^i'^) ~ ^ Xkexp{2TTik ■ v). 
fcez2\{o} 



Ck = jYj exp(27riA; • x), 



where k = {k2, —ki). Then any X £?{ may be written as 



X= ^ Xkek = ^{X,ek)ek. 

fcGZ2\{0} k 




0.0 



0.4 



1.0 



Fig. 2. An example configuration of the velocity field at a given time instance. The small 
circles correspond to a number of Langangian tracers. 
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We want to restrict attention to real-valued functions, whence the above 
expansion can be restated as 

(5.1) X= Y,{4^^et+:r^^^-} 

for the orthonormal ef^{u) = ■^\/2sin(27r/c • n), e™''(u) = ■^\/2cos(27r/c • u) 
and the upper half of \ {0}, 

Zl = {k = {ki,k2) £Z'^:ki>0,k2>0 or ki < 0, ^2 > 0}. 

The Stokes operator A is such that 

(5.2) Aek = 4TT^\k\^ek. 

We will assign a prior measure, vr, on Xq. We choose this to be a Gaussian 
measure with mean zero and precision operator C = —A'^ for a positive real 
a. To be precise, the covariance operator will be 

C=Y, (47r2)-°|A;|-2"(ef° ® ef" + er)- 

For the Gaussian measure to be well defined it is necessary that C is trace- 
class, that is, Yl,k&? |^|~^" < oo, which requires a > 1. We condition the 
prior on the observations, to find the posterior measure, vr, on Xq. As noted 
before, z is a function (the so-called observation operator) Q of Xq, the 
initial condition, so we may write 

W = G{XQ)+i, 

where one should have in mind that Q = Q^^^ and Q = Q^^^ for the Eulerian 
and Lagrangian case, respectively. The likelihood of W conditionally on Xq 
is 

P[y|Xo]ocexp(-i|VF-g(Xo)||). 
By the Bayes rule we deduce that 

(5.3) ^(Xo) oc exp(^-^|iy - g(Xo)ll) , = AA(0,C). 

We have thus constructed another explicit example of the structure (3.1) 
where now 

ci>{x) = \\w-g{x)\l. 

By tuning a we may induce more smoothness in the prior and posterior 
measures. From the Karhunen-Loeve expansion, under the prior Gaussian 
measure vr we get 

(5.4) xf,xr ~Ar(0,(47r2)-"|fc|-2"), 



18 



A. BESKOS, G. ROBERTS AND A. STUART 



all coefficients being independent of each other. A finite-dimensional approx- 
imation of (5.3) will be derived by truncating the Fourier series. For integer 
> we define 

Zu,N-={k^^u-\^^i\At^2\<N}. 

One can check that the cardinality of is 2N{N + 1). We will arrange 
x'^^ with k G Z^tv re-dimensional vector x, with re = AN{N + 1). 

To this end, we order the elements of spiral-wise, as shown in Figure 3; 
the construction gives rise to an ordering mapping a : i-^ Z"^ , analytically 
specified as 

a{{i,j))=2{N-l)N + M, 

where 

N = N{{i,j)) = \i\ Vi, 

{j + 1, i = N,j < N, 

2N-i + l, j = N,i>-N, 
m-j + l, i = -N. 

The mapping is better understood via Figure 3. We set 




(-N,0) (-1,0) (0,0) (1,0) (2,0) {N,0) 



Fig. 3. The construction of the ordering mapping (t:Z^k->Z^. The circles represent 
points in Zfj. The paired numbers in parentheses are coordinates of points; the single 
numbers show the ordering of points w.r.t, o{). For example, (t((2, 0)) = 5, f((2, 2)) = 7. 
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5.2. The results. In this context, for s G M we define 

1/2 



with corresponding space H'^ = {X : \\X\\s < oo}. We wiU need some analyti- 
cal results from [11]. Assume that the force h in the Navier-Stokes equation 
is such that /q H/iH^dt < oo for some e > 0. Then, in the Eulerian case it is 
true that 

\Q^^'^{X)\<C{l + \\Xf) 

for any X ^7i. Also, there exists locally bounded function 5 : M"*" x R"*" M"*" 
such that 

I^EUL^y) - gEUL(^)| < ^(ii^ii^ ||y||)||y - x\\. 

For the Langrangian case we have 

\G^^^{X)\<C{l + \\Xff^ 

and, if X, y E Ti^ for some s > 0, then there exists locally bounded 5 such 
that 

igLAG(y) _ gLAG(^)| < ^(ii^ii^^ ||y||,)||y - x\i. 

Note that < oo for any s < a — 1, so we can restrict attention on the 

set of full probability measure W° for some chosen ^ < sq <a — 1 instead 
of the whole 7i. Under this observation, one can see that the conditions on 
g^^^ and g^UL both imply that for Y,X£ 

|<A(y) - <AWI < ||y||,j||y - 

(5.7) 

\<t>{X)\<C{l + \\X\f) 

for some locally bounded function 5; these are reminiscent of Condition 5 
which, as we show in Appendices A and B, they imply after making the stan- 
dard correspondence between X and its Fourier coefficients and employing 
a spectral truncation to obtain a sampling problem in R". 

Proposition 3. For any a > 1, the limiting results of Theorems 3 and 
4 apply to the target 7r„ in (5.6) both for Q = ^^UL g _ gLAG ^ under 
the specifications 

K = a/2, ^ 1^2a r (^2 ^ y2)a 

2 J[o,i]2 



^SLA 



-vr^" / {x^ +y'^f''dxdy. 
2 7[o,i]2 
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6. Further directions. We present some additional MCMC samplers which 
can give improved results. 

6.1. Preconditioning. The local MCMC samplers we have so far consid- 
ered use the same step-size over all directions of the target domain. The 
standard deviations decrease as i~^ and the sampler will adjust its step- 
size to accommodate for the smallest scale, 7i~^, resulting in the 0(n~^'^~^) 
squared-jump-distance reported in the paper. When the different scales are 
analytically known one can think of allowing the step-size to vary over the 
different coordinates. We will now briefly pursue such a direction: we demon- 
strate that the analytical conditions stated so far in the paper suffice to also 
describe these slightly modified algorithms. 

We follow the general context with target distribution 7r„ in (1.3) and 
scalings Aj = Ai.„ with C^i~'^ < Xi^n ^ C+i~'^ for some k>0. Consider the 
following MCMC proposals: 

(6.1) P-RWM■.yi = Xi + anXi,nZ^, 

(6.2) P - SLA: yi = Xi + ^^(-^'(x./Ai,^)) + anXi,nZi, 

which compared with the original RWM and SLA, adjust the step-size to the 
individual scales and use step-size CnXi^n for the ith coordinate instead of o"„. 
The prefix (P-) stands for "preconditioning" referring to the effect of these 
algorithms to fiatten the standard deviations along the scalar components 
before applying a regular RWM or SLA proposal. For the following result 
we set A = A„ = diag{Ai,„, . . . , Xn,n}- 

Corollary 2. Let-Kn in (1.3) he the target distribution for P-RWM and 
P-SLA. Assume that cr,^ = Pn~^ for l,p> 0. If {0n} satisfies Conditions 3 
and 5, then: 

(i) ifp = I, then E[an{x , y)] ^ a{l) and 



Xi*,n ) J 



for ail) as in Theorem 1 under the specification r^^^ = r^^^ = 1, 
(ii) ifp>I, then E[an{x,y)] ^ I and = Pn~P + o{n~P), 
(in) if < p < I, then E[an{x,y)] = 0{n~P) and = 0{n~P) for any 
p>0. 

Proof. On the transformed space x i— > A~^x algorithms P-RWM and 
P-SLA coincide with RWM and SLA, respectively, as applied to the target 
TT^ specified as 

^(x) =exp{ -(/.„( Ax)}, jr^^{x) = Y[f{xi). 

n i=l 
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It is a straightforward consequence of the definition of the |x|s-norm that 
if (pn satisfies Condition 5, then so does 4>n{x) = 0„(Ax). So, all results of 
Theorems 3 and 4 apply for the target vr^. The statements of the corollary 
correspond only to a translation of these results on the original coordinates 
2; = Arc*. □ 

6.2. Implicit scheme and Gaussian law. An implicit version of SLA ap- 
plied when the reference measure is Gaussian satisfies an identity allowing, 
in some cases, for 0{1) squared-jump-distance. To see this, assume that TTn 
is Gaussian with / = A^(0, 1) and consider the proposal y for current position 
X defined through the equation 



for 9 G [0,1]. Note that 6 = corresponds to the standard SLA proposal 
(1.5). After some calculations we find that for this proposal the acceptance 
probability is 



Clearly, an{x,y) = 1 if = 1/2 and (pn = 0, so Gaussian targets are invariant 
under the update (6.3) with 6 = 1/2 (which corresponds to the familiar 
trapezoidal rule in the numerical analysis literature) for any step-size o"„. 
Thus, in the Gaussian scenario, we get 0(1) squared-jump-distance. Even 
for the case when the change of measure makes the target non-Gaussian, 
we have shown in [5] and [6] that such implicit proposals can provide well 
defined MCMC algorithms for infinite-dimensional targets having a density 
w.r.t. a Gaussian measure thus giving rise to algorithms of 0(1) squared- 
jump-distance in the discretized context. 

7. Conclusions. We have presented a family of nonproduct distributions, 
arising naturally in high-dimensional applications, for which analytical re- 
sults for the asymptotic behavior of local MCMC algorithms can be ob- 
tained. The results in the paper constitute a contribution toward the under- 
standing of the computational complexity of Metropolis-Hastings methods 
in application to high-dimensional, complex target measures with mathe- 
matical structure tailored to a range of important applications. 

In this context, the inverse of the squared-jump-distance 5„ provides a 
measure of the number of steps required to cover the invariant measure, as 
a function of the dimension of the state space, n. In a concrete application 
this needs to be combined with information about the cost of each proposed 
step, again as a function of n. To be concrete we consider the case where the 



(6.3) 




an{x,y) = 1 Aexp 




22 



A. BESKOS, G. ROBERTS AND A. STUART 



reference measure is a mean-zero Gaussian one with covariance operator C„ 
and precision operator Ln = —C~^ and observe that, although our analysis 
in the paper is conducted in the Karhunen-Loeve basis, we need not assume 
that this is known to implement the methods. That is, we may write the 
proposals as: 



RWM: 


y = 


X + anZ, 


SLA: 


y = 


2 

X H -^CnX + (JnZ, 


P-RWM: 


y = 


x + anCl/'^Z, 


P-SLA: 


y = 


X 2""^ ~^ ^nCfJ Z ^ 


6'-SLA: 


y = 


2 

X + '^{eCny + (1 - e)CnX) + OnZ. 



Having in mind also the calculation of the acceptance probability, all meth- 
ods require evaluation of (pnix) and Cnx; then, P-RWM and P-SLA require 

2 

drawing from N(0,Cn) and &-SLA (for 6 = 1/2) inverting /— ^Cn- All such 
costs should be taken into account for an overall comparison of the differ- 
ent algorithms. Thus, even in the case of Gaussian reference measure, the 
relative efficiency of the methods depends crucially on the precise struc- 
ture of the reference measure; for instance, the case of Markovian reference 
measures, for which the precision operator has a banded structure, will be 
markedly different from arbitrary non-Mar kovian problems. 

From a mathematical point of view the results in this paper center on a 
delicate interplay between properties of the reference measure and properties 
of the change of measure. The tail properties of the reference measure are 
captured in the scaling properties of the standard deviations [see (1.3c) and 
(2.6)]. The assumptions we make about (p in Conditions 3, 4 and 5 control 
the manner in which the target measure differs from the reference measure, 
in the tails. Since the tails control the optimal scaling of the algorithms this 
is key to the analysis. In particular the conditions on the exponents used 
in the norms in Conditions 4 and 5, and their upper bounds in terms of k, 
are precisely those required to ensure that the product measure dominates 
in the tails; the choice of norms really matters as we approach an infinite- 
dimensional limit. It is notable that the conditions imposed on the change 
of measure do indeed hold for the complex infinite-dimensional applications 
that we consider in this paper. 

We anticipate that there is a whole range of other applications which fall 
into the class of problems we consider in this paper. For example, one other 
natural application area is the case when 7r„ represents a prior of indepen- 
dent components for a Bayesian analysis with dirn/dTTn corresponding to the 
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likelihood function. In this context, for instance, the lower bound for in 
Condition 3 (resp. upper bound for the likelihood) will be typically satisfied. 

A particularly interesting avenue for further study in this area concerns 
the possibility of obtaining diffusion limits for the MCMC methods when 
the proposal steps are scaled optimally. This program has been carried out 
in the case of i.i.d. product targets in [21] and [22] (see also [1] and [2]). In 
that case each component is asymptotically independent from the others, 
so it is possible to consider an independent scalar diffusion process in each 
coordinate. In the nonproduct scenario considered in this paper it is antic- 
ipated that the diffusion limit will be a stochastic PDE which is reversible 
w.r.t. the target measure. Proving such a result would yield further insight 
into the optimality of MCMC methods. 

APPENDIX A: TAYLOR EXPANSIONS FOR MCMC 
ALGORITHMS (TARGET njy) 

We present a Taylor expansion for the term Rn in (2.1) to be used through- 
out the proofs. In this context we assume that the target is 7r„ with x ~ vfri- 

A.l. The RWM algorithm. We consider the exponent Rn = R^^ in 
(2.2). 

Case A (a,^ = Pn~'^'^~^n~^ with e > 0). Viewing i?„ as a function of 
an, we employ a second-order Taylor expansion around (T.„ = 0, separately 
for each index i = 1, . . . , n. Thus, 

(A.l) Rn = Al,n + A2,n+Kn, 

where we have defined 

i=l i=l i=l 

(A.2) Ci,i = -g'{xi/X,)Z,, C2,^ = -9"{xi/X,)Zf/2, 

Ui,n = -g^^\x,/\i + Z,a*/\i)Zf/Q 

for some a* G [0,0"^] different for each i. Note that {Ci^jji, {C2,i\i are both 
sequences of i.i.d. random variables. Using Condition l(i) we find that 

(A.3) \Ui,n\ < Mi{xi/Xi)M2{Zi)Ah{a*/X,) 

for some positive polynomials Mi, M2, M3. From Condition l(ii) we get 
E[Mi{xi/ Xi)] < 00; also E[M2{Zi)] < 00, both these expectations not de- 
pending on i. Note that OnjXi — > 0, so M3(c7|/Ai) < Kq for a constant 
i^o > 0. We can now obtain the following results: 
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• Un > 0, 

. if £ > 0, then Al^n ''^"^ 0, ^2,n ''^"^ 0, 

. if e = 0, then Ai,n ^ iV(0, l^j^), A2,n ''^"^ -TT^, where 
(A.4) K = K^"^^ = Ef[{g'{X)f]. 

The hmit for Un fohows directly from (A. 3). Also, the stated limits for 
e > follow from simple calculations. For the results when e = we note 
that a version of the Lindeberg-Feller theorem for the case of scaled sums 
of i.i.d. variables (see, e.g.. Theorem 2.73 of [7]) gives that Ai^n converges 
in law to a zero-mean Gaussian random variable with variance the limit 
as n — > oo of „] which can be easily found to be I^E'fCf .]/(! + 2k). 
For A2,m straightforward calculations give that it converges in L2(vrn) to 
/^£'[C2,.]/(1 + 2k); the product law gives that 

Efy{Xf] = Ej[g"{X)l 

so, in fact, the limit for ^2,n can also be expressed in terms of K. 

Case B {a^ = f'n~'^'^~^rf with e £ (0,1)). We now select a positive 
integer m such that 

m+l>2/{l- e) 
and use the m-order expansion: 

m 

in the place of (A.l), where now 

i=l \ i=l \ 

Cj,i = -g^^\xi/X,)Zy{j\), UL = -5^™+^)(x./A. + Z,a*/X,)-^-— 

[m + Ij! 

for some corresponding a* G [0, cj„]. We can now obtain the following results: 

, Li(7r„) 

• Un — ^ 0, 

• E[Rn] — > — oo as fast as —rf, 

• for any positive integer q, E[{Rn — E[Rn\f''^] = C'(?i^''^). 

For the first result note that the residual terms U[n can be bounded as in 
(A. 3), so the limit follows from the particular choice of m. For E[Rn\ we 
remark that, since -E'[^i^„] = 0, we have 

m n j 

E[Rn]=J2E[Aj,n] + Oil), E[A,,n]=Y.^E[CjA- 

j=2 i=l 
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From the analytical expression for C2,i- 

E[C2,] = -\ I {g'{u)fe^^{-g{u)}du<Q. 

^ JR 

All other Cj_. satisfy E\Cj^.\ < oo. Trivially, the highest-order term among 
the summands for E[Rn] is the one corresponding to j = 2 which indeed 
grows to — oo as fast as —n^. For the third result, among the (m + 1) zero- 
mean sums comprising i?„ — E[Rn] the one with the highest-order L2(j-iiorm 
is Ai^m so the triangle inequality gives that the provided expectation is of 
the same order as Now we can take out of the expectation the Ji^/^ 

factor of an', the remaining expectation is 0{1). To prove this last statement 
one needs to consider the polynomial expansion; we avoid further details. 

A. 2. The SLA algorithm. We now use Taylor expansions for the corre- 
sponding term i?„ = Rf,^^ given in (2.3). 

Case A (a^ = l'^n~'^'^~^/'^n~^ with e > 0). We employ a sixth-order Tay- 
lor expansion to obtain the structure 

6 

Rn — ^ ^ "^ii" ~^ 

j=3 

(A.5) 

i=l \ 1=1 i 

We start the summation from j = 3 since the first two sums are identically 
zero. The above expansion considers some corresponding i.i.d. sequences 
{^j,i}2=i each j and residual terms; any Cj^i is a polynomial function 
(the same over i) of Zi and Xi/Xi. Using the calculations in [22] we have 

Ci, = C2, = 0, EiC-i,] = E[Ci,.] = ^[Cs,.] = 0, E[Ce,] < 0. 

From this paper we will also require the analytical calculation 

C3,^ = g^'^Hxi / Xi) Zf / 12 -g\xi/X^)g"{x^/Xi)ZJ 4. 

From the product rule, 

Ef[{g'{X)g"{X)f] - 2Ef[{g' g" g^^'>){X)] = EfW^Xf], 

thus 

E[Cl] = Ef[3g"iXf + 5(ff(3)(x))V48. 
We will need the following results: 
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• if e > 0, then Aj^n ^^^^ for all 3 < j < 6, 

• if e = 0, then A3,n ^ N{0,fK), both A,n, A,n and A,n 

where 

(A.6) K = i^SLA ^ Ef[3g"{Xf + 5(5^=^) (X)) ^48 > 0. 

The residuals can be bounded as in (A. 3); the limit is then straightforward. 
For the other quantities we work as in the case of RWM; note that the 
identity 

E[Cl]+2E[C<i,]=0 

demonstrated in [22] allows for the limits of As^n and Ae,n to be expressed 
in terms of the same constant K. 

Case B {a^ = fn~^''~^/^n^ with e G (0, 1/3)). We now use an m-order 
expansion, where m is such that 

(?n+l)(l-3e)>6 

and consider the corresponding expansion 

m 

Working as for Case B of RWM we obtain the following results: 

• ^ 0, 

• E[Rn] — > —00 as fast as n^^, 

• for any positive integer q, E[{Rn — E[Rn\f''^] = C'(n^^ '^). 

APPENDIX B: PROOFS 
The following lemmas will be used at the proofs. 

Lemma B.l. Let T he a random variable. Then: 

(i) for any 7 > 0.' 

E[l^e']>e-'<{l-^y 

(ii) if E\T] = — c < 0, then for the residual Tres = T — E\T\ and any q > 0: 

i?[lAe^]<e-^/2^2«S£!d!l. 
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Proof. For the lower bound we have that 

E[l A e^] > ^[(1 A e^) • /{|r| < 7}] > e-^P[|r| < 7], 

which, from the Markov inequahty, gives the required result. For the second 
inequality: 



E[l A 



E 



{l^e^)■I\\T,,,\<- 



+ E 



(lAe^)-/ |rre.|>- 



which, again from Markov inequality, gives the required result. □ 
Lemma B. 2. If X M{^i,a'^), then 



Lemma B.3 (Sobolev embedding), (i) Let X £ L'^{[{),T],M). Consider 
the expansion w.r.t. the sinusoidal basis {sin(A;7r • /T)}^^: 

00 

X{t) = ^Xksm{kTrt/T). 

k=l 

// s,p G M are such that s < ^ and 2 <p < — s)"^ , then 

\\X\\l,<C\x\s 

for a constant C > 0, where \x\s = (X^fc^i ^^'*l^fcP)^^^■ 
(ii) Let X = {Xt)^i € for integer N > 0. Consider the sinusoidal basis 
, {sin(-^^^); t = 1, . . . , A^}^-^, and the expansion: 

Xt = xu sin I I . 

\N + l) 

If s,p G M are such that s < ^ and 2 <p < {-^ — s)^^ , then 

1 



N 



1/p 



\t=i 



N + l 



<C\x\ 



for a constant (independent of X and n) C >0. 

Proof, (i) If y G L^{[-T, T], C) is periodic with period 2T, its Fourier 
expansion is 

00 -| 
(B.l) Yit)= y, = _(y,e*WT). 



fc=— 00 



2T 
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The Sobolev embedding (see page 436 of [23]) gives that, for s,p as in the 
statement of the lemma, 

{k=oo -\ 1/2 

{l + \k\^^)\yu\^\ . 
k=—oo ) 

This is a consequence of the fact that for conjugate positive reals p, q (i.e., 
+ = 1) we have 

(B.3) \\Y\\L,<C\\y\\i^. 

See the above reference for more details. Assume now that Y is specified 
as follows: Y{t) = X{t) for t > 0, and Y{t) = -X{-t) when t < 0. This 
symmetricity around the origin means that y-k = —yt (for instance, yo = 0), 
so using this: 



y(t) = ^2iyfcsin(A;7rt/r). 



fc=i 

X, Y coincide on [0,T] so = 2iyk and, using (B.2): 

II-'^IIlp{[o,t],r) < II'5^IIlp{[-t,t],r) < C\x\s. 

(ii) Consider a vector Y = {Yt}'^L_(^j^_^i^ S C^^"*"^. We define a function Y 
on [-T,T]: 

N 

Y{t)= YjIijT/(N+l),ij+l)T/{N+l)){t), F(r) = YL(jv+i). 

j=-(7V+l) 

The continuous- and discrete-time Fourier expansions can be written as 
follows: 

oo N 

Y{t)= Y: Vke''-'^^, Y= E 

k=-oo k=-(N+l) 

where, after some calculations, we find that for A; G Z with k^O: 
1 ^ 

2ikTT . ^ 

with yo = Ef=-(7v+i) Yj. Also, for -(iV + 1) < A; < N: 



N 

i 

Vk 



1 ^ 

+ 1). 
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One can easily now check that, for —{N + 1) <k <N, 

I ^-ink/{N+l) _ I 
in k/{N- 

whereas for Af G Z and -(A^ + 1) < A; < iV 



ilT k/{N + l) 



k 

yM(2N+2)+k - M(2iV + 2) + fc^'- 

Note that |e*'^" — 1| < C\a\ for a G [0, 1]. So, using the last two equations and 
(B.3), for conjugate p, q we get 



/TV 1 \ f ^ 

\t=-{N+l) ^ / \k=~{N+l) 



= WVWh- 



An application of Holder's inequality gives (see page 437 of [23] for details) 
that 

( k=N \ 1/2 

\\y\K<c\ a+mk\'\ 

[k=-{N+l) J 

for a constant C independent of {yk} and n. So, in total: 

/ N ^ \^^^ ( ~1 

\t=-(Ar+l) / U=_(Ar+i) ) 

To prove the statement (ii) we use the standard method as in (i): we specify 
the vector Y = {Yt} as Yt = XtJor t = 1, . . . , N, Yq = 0, and Yt = -X_t, for 
t = —N, . . . , —1, YL(7v+i) = 0. Then one can find that yo = y_(7v+i) = and 
j;^ = 2zyfc for k = 1, . . . ,n. The required result then follows directly. □ 

Proof of Theorem 1. Expectations are in stationarity, so 2: ~ 7r„ 
and y is determined from the transitions (1.4) or (1.5) for RWM or SLA, 
respectively. 

The RWM algorithm. Case (i): = Pn''^'^-^. 
From Appendix A.l we get that 



2 1 + 2k 1 + 2k 
So, Lemma B.2 gives that 

lim E\an(x,y)] = 2<^>( --J ^ 
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Case (ii): = fn~'^'^~^n~^ for e > 0. 

From Appendix A.l we get that i?„ — > in Li(7r„). The result follows 
from the Lipschitz continuity of x i-^ 1 A . 
Case (iii): cj^ = l'^n~'^'^~'^rf for e e (0, 1). 

Appendix A.l gives that E[Rn] — > — oo as fast as —rf, and that for arbi- 
trary integer g > 0, E[{Rn — = 0{n^''^). So, Lemma B.l(ii) implies 
that E[an{x,y)] faster than any polynomial-order. 

The SLA algorithm. The proof is similar and follows from the results in 
Appendix A. 2. □ 

Proof of Theorem 2. Here a„(a;,y) = 1 A e''^"(^)~''^"(^)+^" , for i?„ as 
in the product case. Note now that 

for a constant C > 0. So, the required result for 2k < p < 2k -|- / follows now 
directly from Theorem l(iii). For the case when p>2k + I note that, using 
the Taylor expansions for Rn in Appendix A, we can easily find that 

lim sup I ii„ I < oo. 

n— too 

The boundedness condition on cpn gives that 

lim sup Stt^ \(j)n{x) - (j)n{y) + Rn\ < Ci + C2 lim sup £'^„ \Rn\< 00 

n— too n— too 

for constants Ci, C2 > 0. So, Lemma B.l(i) implies that £'^„ [an{x, y)] is lower 
bounded. □ 



Proof of Theorem 3. We wiU first show that, for any p > 2k, Condi- 
tions 3 and 5 imply that (pniu) — 4'n{x) — > in Lq{'Kn) for any g > 0, for both 
RWM and SLA. We will then proceed with some calculations to obtain the 
required results. 

Let cj^ = Pn~'^'^~^ for some e > 0. Recah that under TTn, Xi ~ AA(0,i~^'') 
independently over i. Note that for an arbitrary s < k — 1/2, 



E- \r\ = E- 



E 

.i=l 



i=l 



for a constant C > 0. Similarly, we find that if q is a positive integer, then 
^s-nkls^ ~ iJ2i=i i^^**""))^, so for any integer q>0: 

(B.4) E^Jx\l'^<C. 

This result directly implies that, for both the RWM and SLA proposal y, 
(B.5) E^Jy-xll'^^O. 
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To see this note that, from the triangle inequahty, apphed for the | • |s-norm, 
and the definition of the proposal y, 

E^Jy - x\l^ < C{E\anZ\f + E^Mf) 

2 

for vector z with Zj = in the case of RWM, and Zi = ^g' [xi/ Xi) / Xi in 
the case of SLA. Note now that we can write xj = for some i.i.d. 

random variables (in this case ~ Xi ) but the particular distribution is 
not important for our argument, as long as it has finite moments). Similarly, 
we can write ((T„Zi)^ = n~^n~^'^^^ and zf = n~^^n~^'*(n/i)~^''^^ for some 
i.i.d. positive random variables different for each case. It is now clear 
that 

which explain (B.5). Given (B.4), (B.5), the triangular inequality implies 
(B.6) E^Ml'<C. 

We now set A(/>„, := 4)n{y) — (pnix) and proceed as follows: for any i? > 0, 

E\AcPn\'' = E[\AcPnm\x\s < R, \y\s < R]] + E[\A(l)nm\x\s > Rov \y\, > R]] 
< i{R)E\y - x\l + CE[{1 + \x\ll + \yZt)l[\x\s > Ror \yU > R]] 
<^{R)E\y-x\l, 

+ C{E[1 + Ixl^P" + \yfj','^]y/\F[\x\s >R]+ P[|yU > R])'^^ 

where 7(i?) = sup„<j:j^<j:j(5'^(a, 6). Let e > 0. From (B.4) and (B.6) and the 
Markov inequality, we can choose some R = i?(e) so that the second term 
on the right-hand side of the last inequality is smaller than e/2. Also, (B.5) 
implies that the first term is smaller than e/2 for sufficiently large n. Thus, 
for any q> 0, 

(t>n{y) - (t>n{x)^^^ 0. 

Condition 3 gives also that, for any q> 0, 

(B.7) ^„(y)_^„(^)^l(^")0. 

From the Lipschitz continuity of x i— > 1 A e^', for any p > 2k, 

(B.8) E^,^ [1 A e<^"(^)-<>"(f )+^"] - E^,^ [1 A e^"] ^ 0. 

We now distinguish between RWM and SLA and the various step-sizes. 



32 



A. BESKOS, G. ROBERTS AND A. STUART 



The RWM algorithm. We use the expansion i?„ = i?^^^ = Ai^n + •^2,n ■ 
Un in Appendix A.l. 

Case (i): al = Pn-'^''-K 

For this step-size we have shown in Appendix A.l that ^2,n ~' ' ^ 



2 1+2k 

and Un — > in Li(7f„). Due to Condition 3, the same hmits will also hold in 
Li(7rn). Recah from (A.2) that Ai,n = a"nEr=iCi,n for ^i,„ = -X:[^g' {xi/ Xi)Zi. 
Due to Zi, for each n the process {Si^n}i'=i with Si^n = o"n Z]j=i '?«,n is a mar- 
tingale. Under vfn, the independence among ^i^n together with some tedious 
calculations give that 

1=1 

From Condition 3, the same limit holds in Li(7r„). The Martingale CLT 
from page 58 of [17] now gives that, under 7r„: 



1 + 2k, 

So, comparing with the results for the product case in Appendix A.l, Rn 
has the same limiting behavior under 7r.„ and 7f.„, implying that 

(B.9) ^^JlAe^"]^a(/) 

for the same a{l) as in the case when the target law is 7r„. Equations 
(B.8) and (B.9) give the required result. 
Case (ii): cj^ = Z^n~^^~-^n~'^ for e > 0. 

For this step-size Appendix A.l gives that ii„ ^ in Li(7r„) and Condi- 
tion 3 implies that the same limit holds also in Li(7r„). Equation (B.8) now 
provides the required result. 

Case (iii): al = Pn-^''-^n^ for e £ (0, 1). 

From Condition 3, it suffices to show that nPE^„ [1 A e<^"(^)-<?^"(y)+^"] ^ 0. 
In Appendix A.l we have shown that Ej^^[Rn] — > — oo as fast as — n^; also, 
for any integer g > 0, we have shown that £'^-^[(i?„ — Eji^[Rn])'^'^] = 0{n'^''^). 
From (B.7) , the same orders persist if we replace Rn with (f)n {x) — (pn {y) + Rn ■ 
The result now follows from Lemma B.l(ii). 

The SLA algorithm. We use the corresponding expansion Rn = A2,^n + 
A^^^n + A^^n + A^^n + U'n. The results for cases (ii) and (iii) are obtained 
exactly as in the case of RWM. For case (i), the Martingale CLT gives (as 
for RWM) that Rn has the same limiting behavior under both tt^ and TTn, 
and the required result then follows from (B.8). We avoid further details. 
□ 

Proof of Theorem 4. Note first that 
(B.IO) Sn = E[{x[, - Xi*f] = E[{yi* - x^*fl A e'^"(-)-'^"(^)+^"]. 
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Simple calculations give that, for any p > 2k, 

(B.ll) nPE[{y,* - x,,f] ^ l\ nPE[{yi* - x,,f]"^ ^ l\ 

Since nPE[{yi* —Xi*)^]^/"^ is n-uniformly bounded, the Lipschitz continuity of 
X I— > 1 A e^' and Cauchy-Schwarz inequality imply that any term appearing 
in the exponential in (B.IO) can be replaced with its L2(7rn)-limit when 
considering the limiting behavior of n'^Sn- To be more precise, we have 
shown, for instance, in the proof of Theorem 3 that (pniu) ~ ^Pnix) — > in 
L2i'^n) for any p > 2k. This gives 

< CnPE[{y,, - x,*)^]'/\E\^^^{y) - Mx)\Y^ ^ 0. 

For case (ii) of the proposition, we have shown in Appendix A that i?„ 
in L2{TTn), so also in L2{7Tn) from Condition 3. Now, ignoring (pnix) — 4>n{y) 
and Rn from the expression for Sn we get that n^Sn l"^ ■ For case (iii), we 
use Cauchy-Schwarz to get 

nPSn<nPE[{y,,-Xi*f]^I^E{an{x,yt/\ 

so the result follows from (B.ll) and Theorem 3(iii). We now focus on the 
case (T^ = Pn~^'^~^ and RWM (whence / = 1); the proof for SLA is similar. 
We use the expansion i?„ = Ai^n + •^2,n + in Appendix A.l. Let A\ „, 
^2 n' be the variables derived by omitting the i*th summand from the 
expansions for Ai^n-, -^2,™, Un-, respectively. We define i?* as the sum of 
these terms. From the analytical expressions in Appendix A.l, it is clear 
that i?„ - i?; ^ in L2(7r„). Thus: 

n2«+i5„ - r?'^+^E[{yi* - Xi*fl A e^"] ^ 0. 

We can now factorize: 

n2«+i£;[(yi. - Xi*fl A e^'* ] = fE[Zl] x E[l A e^"]. 

From the proof of Theorem 3 the last expectation, however, converges to 
a{l) and the required result is established. □ 

Proof of Corollary 1. One needs only to replace Aj with Ai^„ in 
all statements of Appendix A and the proofs of Theorems 1-4. When the 
constants i^'KWM^ j<^ShK appear in these proofs (where Aj = they are 
always divided with (1 + 2k), (1 + 6k), respectively; these values arise as the 
limits of n"(2''+^) X^Li and n~^^'''^^)YA=iK^ , respectively. Revisiting 
the proofs shows immediately that in the extended setting of the corollary 
one should now use lim„ n"(2'^+^) X^Li \> and lim„ n~(^''+^) X^Li \> 
the place of the above limits. □ 
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Proof of Proposition 1. For x,yeW^, let Xn = J2iLiX.,iei and 
Yn = J2iLiX.^iei. Then, from (4.8) and the equivalence between the norms 
IIXatIIs and \x\s, the latter defined in (2.5), we have 

\My)-Mx)\ = mYN)-HXN)\ 

< C{1 + \\XnE + \\Yn - XnI^Yn - XnWl, 
<Cil + \x\P + \y-x\P)\y-x\ 

for s = s(p) < 1/2. In this context k = 1 since the reference measure vf^ is of 
the structure nr=i-^(0,Af) with 



A,; 



'2 T 

]3n 



1 -1 



so clearly C_i ^ < Aj < C+i ^ for appropriate C_,C+ > 0. Thus, we have 
found that satisfies Condition 4. Then, for the case of RWM: 



n 



-2k-1 



2 n 



i=l 



i=l 



2r2 



N 



i=l 



6T^ ' 



so we have found r^^^. A similar calculation gives the required limit r^^^ 
for SLA. □ 



Proof of Proposition 2. In this case (j)n{x) =J2i=iG{Xt)At, for 
values Xt = Xt{x) from (4.9). We adjust appropriately the specification of 
the norms. So, for X = (Xj)^i with Xt G R'^ we define 

/ N \ l/P / N \ 1/2 

\\X\\L, = [j:\Xt\^^tj , \\X\U=[Y.i'^\x.fj . 

Let x = (xT^i^, x!|2) • • • > a^^Ar), y= {y^^i^y^,2^ ■ ■ ■ ^V^n) be elements of M", for 
n = d X N , and X = {Xt)^i, {Yt)^^ the corresponding discrete-time paths 
from (4.9). Then, working exactly as in (4.7) and (4.8), using the discrete 
version of the Sobolev embedding in Lemma B.3(ii), we obtain that 

\My) - <PnM < c{i + \\x\\p + \\Y - xr,)\\Y - x\\l, 

for some s = s{p) < 1/2. Note now that ||-^||l2 = 1^1 ^^^^ \\X\\s < C\x\s for 
arbitrary X, thus {4>n} satisfies Condition 4 for s = s{p) < 1/2 and s' = 0. 
In this context k = 1. Indeed, under 7r„, x ~ Yl^^iJ\f{0, Af ^) where 
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Using the fact that C_ < sin{v^)/v < C4. when v G (0,1) for constants 
C_,C+ > we get that C^i~^ < Aj^„, < C+i~^ for some (other) constants 
C_,C+ > 0. It remains to identify and t^^^. For the case of RWM: 



1=1 ^ 1=1 



SO r^^'^ is as stated in the proposition. One can similarly calculate t^^^. 
□ 

Proof of Proposition 3. For x,y G with the structure (5.5), with 
n = 4:N{N + 1), let Xn = Pn{x) and Yn = PN{y)- Recalhng the definition 
oi a = a{{i,j)) we can write for any real s, 

n/2 

il^ivii.'= E i^P^((0' + (4°^)')=Ei^"'«l''(4-i+4)- 

kGZl^ i=l 

From the particular ordering of the elements of x one can easily see that 

C-N^ < |c7~^(i)|2 < C+N^, when 2{No - l)No + 1 < i < 2iVo(^^o + 1) 

for some A'^o < A^. Now, for a given i and corresponding Nq = NQ{i) satisfying 
the second inequality it is true that C-.i^^'^ < Nq < C+i^/^, therefore 

(B.12) C-i<\a~^{i)\^ <C+i. 

This gives that 

C-\\Xn\\s<\x\,/2<C+\\Xn\\s. 

Using now (5.7) we obtain that 

\My) - = \4>{yN) - (t>{XN)\ < s{\\Xn\\so, WYnWsoWn - XnIUo 

< C6' {\x\,^/2,\y\so/2)\y - x\so/2 

for the locally bounded 6' = 6'{-, •) defined as S'{a, b) = supo<tj<a^o<«;<b'^(^; ^) 
for a, 6 > 0. Again from (5.7), 

= \HXn)\ < C{1 + \\XN\f) = C{1 + \xf). 

Thus, (pn satisfies Condition 5 for parameters s = s' = sq/2 and s" = 0; in 
this context k = a/2 since under 7f„, Xi ~ AA(0, A?) with 



A? = (47r2)-"|a-i(rV2l)|- 



'2a 
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So, from (B.12), C7_r°/2 < < C+r"/^. Also, 
-™ = limn-("+i)X^Ar2 = 4-(°+i)(47r2)"lim|iV-2("+i)2 ^ \k\'^ 



T 

n ^ I. ^ ' N \ 

«=1 fceZ^ 



-1 / UZ uz \a -I 



2 \ a 



'N<ki<N 
0<k2<N 



which gives the stated result for 7-^^'^. A similar calculation gives r^^^. □ 
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